[BugFix] Fix legacy logger propagation leaking unformatted logs to stderr by gongweibao · Pull Request #7617 · PaddlePaddle/FastDeploy

gongweibao · 2026-04-24T13:34:56Z

Motivation

Legacy loggers (legacy.*) created by get_trace_logger() and _get_legacy_logger() had propagate left as True (commented out since PR #3431). This caused all log levels (INFO/WARNING/ERROR) to propagate to the Python root logger, which outputs to stderr using the default LEVEL:name:message format — without timestamps.

Impact:

stderr log files (*-stderr.log) are polluted with INFO-level logs that shouldn't be there
ERROR logs in stderr have no timestamps, causing downstream tools (e.g. paddlerl diagnose) to display (no ts) and fail to filter events by time

Root cause: PR #3431 introduced the legacy.* namespace (logging.getLogger(f"legacy.{name}")) but kept logger.propagate = False commented out with the note "maintain original logic". The original code didn't need it because loggers were at the top level. After the namespace change, legacy.api_server became a child of the root logger, and propagation leaked all logs to stderr.

Modifications

Uncomment logger.propagate = False in both get_trace_logger() and _get_legacy_logger()
Update test assertion from assertTrue to assertFalse to match the fix

Usage or Command

No usage change. After this fix:

stderr only contains ERROR-level logs from channel loggers (with proper %(asctime)s timestamps)
INFO/WARNING logs from legacy loggers no longer leak to stderr

Accuracy Tests

N/A — logging-only change, no impact on model outputs.

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests. Test updated to match fix, 109 tests all passing.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…derr Legacy loggers (legacy.*) had propagate=True (commented out since PaddlePaddle#3431), causing all log levels to propagate to the root logger and output to stderr without timestamps. This made downstream log parsers unable to extract timestamps from stderr log files. Set propagate=False for both get_trace_logger and _get_legacy_logger so logs only go through their configured handlers with proper formatting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddle-bot · 2026-04-24T13:35:11Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 修复 get_trace_logger() / _get_legacy_logger() 创建的 legacy.* logger 默认向 root logger 传播，导致日志以 root logger 的默认格式泄漏到 stderr（缺少期望的格式/时间戳）的问题，使 legacy logger 的输出与 FastDeploy 的日志体系隔离。

Changes:

在 get_trace_logger() 与 _get_legacy_logger() 中显式设置 logger.propagate = False
更新单元测试断言，使其与新的传播行为一致

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
fastdeploy/logger/logger.py	禁用 legacy logger 向 root logger 的传播，避免 stderr 被未格式化日志污染
tests/logger/test_logger.py	更新 legacy logger 的 propagate 断言以匹配修复后的行为

Comments suppressed due to low confidence (1)

tests/logger/test_logger.py:88

本次同时修改了 get_trace_logger() 的 logger.propagate 行为（改为 False），但当前测试只覆盖了 _get_legacy_logger() 的 propagate。建议在 test_get_trace_logger_basic（或单独用例）里也断言 logger.propagate 为 False，避免后续回归。

        legacy_logger = self.logger._get_legacy_logger("test", "test.log")
        self.assertFalse(legacy_logger.propagate)

    def test_get_trace_logger_basic(self):
        """Test basic functionality of get_trace_logger"""
        logger = self.logger.get_trace_logger("test_trace", "trace_test.log")

codecov-commenter · 2026-04-24T15:01:36Z

Codecov Report

❌ Patch coverage is 91.83673% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@791484a). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/logger/logger.py	86.95%	1 Missing and 2 partials ⚠️
fastdeploy/entrypoints/openai/api_server.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7617   +/-   ##
==========================================
  Coverage           ?   71.68%           
==========================================
  Files              ?      419           
  Lines              ?    57886           
  Branches           ?     9078           
==========================================
  Hits               ?    41498           
  Misses             ?    13559           
  Partials           ?     2829

Flag	Coverage Δ
GPU	`71.68% <91.83%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-04-27T07:34:44Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-04-27 16:22:19

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 62427d5
Merge base: 791484a (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

有 1 个 Required 任务失败，2 个 Required 任务运行中，需等待完成后确认最终状态。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
36(0)	36	31	1	2	2	0

2 任务状态汇总

2.1 Required任务 : 7/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run Four Cards Tests / run_4_cards_tests`	12m29s	不稳定问题：paddle.load加载路由回放文件时EOFError，文件为空或截断	疑似环境偶发问题，请 rerun	Job	-
⏳	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	-	运行中	-	Job	-
⏳	`xpu_4cards_case_test / run_xpu_4cards_cases`	-	运行中	-	Job	-
✅	其余 7 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 24/26 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
⏸️	`Run iluvatar Tests / run_iluvatar_cases`	-	-	-
⏸️	`CI_HPU`	-	-	-
✅	其余 24 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run Four Cards Tests / run_4_cards_tests — 测试失败（置信度: 中）

根因详情:
e2e 测试 test_GLM_45_AIR_tp4.py::test_r3_accuracy 在调用 check_routing_replay_chat_completion 时，尝试通过 paddle.framework.io.load 反序列化路由回放文件，但遇到 EOFError: Ran out of input，说明该文件为空或未完整写入。本次 PR 仅修改了 logger 传播配置和 stdout/stderr 路由，未触及路由回放数据的写入逻辑，因此此失败与 PR 变更无关联。该类 EOFError 在分布式多进卡 e2e 测试中属偶发性环境问题（路由回放文件在下一步骤读取前写入未完成）。

失败用例:

测试	错误	根因
`test_GLM_45_AIR_tp4.py::test_r3_accuracy`	EOFError: Ran out of input	paddle.load 读取路由回放文件时文件为空

关键日志:

tests/e2e/4cards_cases/test_GLM_45_AIR_tp4.py:205: in test_r3_accuracy
tests/e2e/utils/rollout_routing_replay_test_utils.py:185: in check_routing_replay_chat_completion
/usr/local/lib/python3.10/dist-packages/paddle/framework/io.py:1275: in load
/usr/local/lib/python3.10/dist-packages/paddle/framework/restricted_unpickler.py:221: in safe_load_pickle
E   EOFError: Ran out of input
FAILED tests/e2e/4cards_cases/test_GLM_45_AIR_tp4.py::test_r3_accuracy
==================== 1 failed, 1 passed in 74.42s (0:01:14) ====================

修复建议:

请直接 rerun 该 Job，此失败与 PR 代码变更无关
若持续失败，可检查 tests/e2e/utils/rollout_routing_replay_test_utils.py L185 附近路由回放文件的写入与读取时序逻辑

关联变更: PR 修改了 fastdeploy/logger/logger.py 和 fastdeploy/logger/setup_logging.py 等 logger 相关文件，与本次 e2e 路由回放测试失败无直接关联

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-27T07:39:10Z

+        # INFO/DEBUG logs go to stdout
        "default": {
+            "class": "colorlog.StreamHandler",
+            "stream": "ext://sys.stdout",
+            "formatter": "custom",
+        },
+        # ERROR+ logs go to stderr
+        "error": {
            "class": "colorlog.StreamHandler",
            "stream": "ext://sys.stderr",
+            "level": "ERROR",
            "formatter": "custom",
        },
    },
    "loggers": {
        "uvicorn": {
            "level": "INFO",
-            "handlers": ["default"],
+            "handlers": ["default", "error"],
            "propagate": False,
        },
        "uvicorn.error": {
            "level": "INFO",
-            "handlers": ["default"],
+            "handlers": ["default", "error"],
            "propagate": False,


当前 dictConfig 下同时把 "default"(stdout) 和 "error"(stderr) 两个 handler 绑定到 uvicorn / uvicorn.error，但 "default" handler 没有过滤 ERROR 级别，导致 ERROR+ 日志会同时输出到 stdout 和 stderr（重复一份），与注释“ERROR+ logs go to stderr”不一致。建议给 stdout handler 增加“仅低于 ERROR”的 filter（或等价的过滤方案），确保 ERROR+ 只走 stderr。

Copilot · 2026-04-27T07:39:11Z

+    # Prevent implicit basicConfig() from adding a stderr handler when
+    # module-level logging.info/warning() is called with no root handlers.
+    # A NullHandler on root satisfies Python's "has handlers" check.
+    if not logging.root.handlers:
+        logging.root.addHandler(logging.NullHandler())
+


这里给 root logger 在“无 handler”时添加 NullHandler，会抑制 Python 的 lastResort 输出，并且会让后续用户/上层应用调用 logging.basicConfig() 变得无效（因为 root 已经有 handler 了）。这属于对全局 logging 的侵入式修改，可能导致第三方库/未配置 logger 的告警和错误静默丢失。建议避免直接改 root（例如：修复触发 root 的 module-level logging.* 调用、或只在 fastdeploy 自己的 logger 树上加 handler / filter，或提供开关并在文档中说明行为变化）。

Suggested change

# Prevent implicit basicConfig() from adding a stderr handler when

# module-level logging.info/warning() is called with no root handlers.

# A NullHandler on root satisfies Python's "has handlers" check.

if not logging.root.handlers:

logging.root.addHandler(logging.NullHandler())

Copilot · 2026-04-27T07:39:11Z

+    # Prevent implicit basicConfig() from adding a stderr handler when
+    # module-level logging.info/warning() is called with no root handlers.
+    # A NullHandler on root satisfies Python's "has handlers" check.
+    if not logging.root.handlers:
+        logging.root.addHandler(logging.NullHandler())
+


PR 描述中写的是仅恢复 legacy logger 的 propagate=False 并更新测试，但实际还包含：修改 setup_logging(root NullHandler)、调整 OpenAI 入口的 uvicorn log_config/输出流、以及 scheduler_metrics_logger 输出流等。建议补充 PR 描述说明这些额外改动的动机与影响范围，或拆分为独立 PR，避免 reviewer/发布时遗漏风险评估。

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-04-27 15:50:01

📋 Review 摘要

PR 概述：修复 legacy.* 命名空间日志器因 propagate=True 导致未格式化日志泄漏到 stderr 的问题，同时统一全局日志的 stdout/stderr 路由策略。

变更范围：fastdeploy/logger/、fastdeploy/__init__.py、fastdeploy/entrypoints/openai/、fastdeploy/engine/sched/、tests/logger/

影响面 Tag：Logger Infra

📝 PR 规范检查

标题 [BugFix] Fix legacy logger propagation leaking unformatted logs to stderr 准确描述了修复内容，描述结构完整，包含 Motivation / Modifications / Usage / Accuracy Tests / Checklist，规范合规。✓

问题

级别	文件	概述
🟡 建议	`fastdeploy/logger/logger.py:38`	`_MaxLevelFilter` 在三处重复定义，建议抽取复用
🟡 建议	`tests/logger/test_logger.py:105`	`test_get_trace_logger_with_console` handler 数量断言可能错误（3 → 4）
❓ 疑问	`fastdeploy/engine/sched/scheduler_metrics_logger.py:52`	原 `StreamHandler()` 默认输出 stderr，改为 `sys.stdout` 后输出目标变更，PR 描述未说明

🟡 `_MaxLevelFilter` 三处重复定义

_MaxLevelFilter 在以下三处各自独立定义，实现完全相同：

fastdeploy/__init__.py（_MaxLevelFilter）
fastdeploy/logger/logger.py（_MaxLevelFilter，本次新增）
fastdeploy/logger/setup_logging.py（MaxLevelFilter，含轻微差异：支持字符串 level）

建议统一提取到 fastdeploy/logger/setup_logging.py 或 fastdeploy/logger/filters.py 中，其他模块 import 复用，避免后续三处逻辑出现分歧。

🟡 `test_get_trace_logger_with_console` handler 数量断言疑似失配

tests/logger/test_logger.py:105（未在本 PR diff 中修改）：

self.assertEqual(len(logger.handlers), 3)  # main log + error log + console

但 _add_console_handlers（本 PR 新增）会添加 2 个 handler（stdout_handler + stderr_handler），加上 file + error_file = 4 个，与注释和断言的 3 不符。

若该测试实际已在 109 个测试中通过，请说明原因；否则建议将断言修改为：

self.assertEqual(len(logger.handlers), 4)  # main log + error log + stdout + stderr

❓ `SchedulerMetricsLogger` 输出流由 stderr 改为 stdout 未在描述中说明

fastdeploy/engine/sched/scheduler_metrics_logger.py：

原代码 logging.StreamHandler() 默认写入 stderr，本 PR 改为 logging.StreamHandler(sys.stdout) 后写入 stdout。这是一个行为变更（metrics 日志目标发生变化），建议在 PR 描述的 Modifications 段中补充说明，以便下游依赖 stderr 采集 metrics 的工具评估影响。

总体评价

本 PR 核心修复（取消注释 logger.propagate = False）方向正确，同时通过 setup_logging 对 legacy 命名空间提前隔离增加了防御性。建议在合并前确认 test_get_trace_logger_with_console 的 handler 计数是否符合预期，并在 PR 描述中补充 SchedulerMetricsLogger 输出流变更的说明。

PaddlePaddle-bot · 2026-04-27T07:52:19Z

 _LOG_FORMAT = "%(levelname)-8s %(asctime)s %(process)-5s %(filename)s[line:%(lineno)d] %(message)s"


+class _MaxLevelFilter(logging.Filter):


🟡 建议 _MaxLevelFilter 在本文件（logger.py）、fastdeploy/__init__.py 和 fastdeploy/logger/setup_logging.py（名为 MaxLevelFilter）中各自定义了一遍，实现完全相同，存在三处重复。

建议将该类统一放在 fastdeploy/logger/setup_logging.py 或单独的 fastdeploy/logger/filters.py 中，其他地方 import 复用，避免后续维护时三处逻辑不一致。

Copilot AI review requested due to automatic review settings April 24, 2026 13:34

gongweibao temporarily deployed to Metax_ci April 24, 2026 13:35 — with GitHub Actions Inactive

Copilot started reviewing on behalf of gongweibao April 24, 2026 13:35 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread fastdeploy/logger/logger.py Outdated

Comment thread fastdeploy/logger/logger.py Outdated

This comment was marked as outdated.

Sign in to view

fix some

6d94496

gongweibao had a problem deploying to Metax_ci April 27, 2026 07:28 — with GitHub Actions Error

This comment was marked as outdated.

Sign in to view

gongweibao requested a review from Copilot April 27, 2026 07:35

Copilot started reviewing on behalf of gongweibao April 27, 2026 07:35 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

fix some

62427d5

gongweibao temporarily deployed to Metax_ci April 27, 2026 07:44 — with GitHub Actions Inactive

PaddlePaddle-bot reviewed Apr 27, 2026

View reviewed changes

		_LOG_FORMAT = "%(levelname)-8s %(asctime)s %(process)-5s %(filename)s[line:%(lineno)d] %(message)s"


		class _MaxLevelFilter(logging.Filter):

Conversation

gongweibao commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 7/10 通过

2.2 可选任务 — 24/26 通过

3 失败详情（仅 required）

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

🟡 _MaxLevelFilter 三处重复定义

🟡 test_get_trace_logger_with_console handler 数量断言疑似失配

❓ SchedulerMetricsLogger 输出流由 stderr 改为 stdout 未在描述中说明

总体评价

Uh oh!

PaddlePaddle-bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gongweibao commented Apr 24, 2026 •

edited

Loading

codecov-commenter commented Apr 24, 2026 •

edited

Loading

PaddlePaddle-bot commented Apr 27, 2026 •

edited

Loading

🟡 `_MaxLevelFilter` 三处重复定义

🟡 `test_get_trace_logger_with_console` handler 数量断言疑似失配

❓ `SchedulerMetricsLogger` 输出流由 stderr 改为 stdout 未在描述中说明